Data analysis

Visualising and data tidying using R

The Course Structure

Week Topic Assessment Weights (%)
1 Visualising and data tidying using R
2 Introduction to Quarto
3 Regression Modelling
4 Model assessment Moodle Quiz 1 online 15
5 Class test 🖥️ Class test - in person 25
6 Generalised linear Models part 1
7 Generalised linear Models part 2 Peer review (submission phase) 20
8 Introduction to version control Moodle Quiz 2 online 15
9 Collaborative coding Peer review (evaluation phase)
10 Drop-in session (Group projects)
11 Group projects 25

Assessment calendar

Please refer to Moodle for the detailed assessment calendar

Release_week Topics Type_of_assessment Weights Due_Date
Week 3 (Jan 30) Weeks: 1-3 Moodle Quiz 1 15 Week 4 (Feb 06) 11:00 am BST
Week 5 (Feb 13) Weeks: 1-4 Class test 25 Week 5 (Feb 13) 11:00 am BST
Week 2 (Jan 23) Information during Week 2 Peer assessment 20 Week 7 (Feb 27) 11:00 am BST (End of Submission Stage)
Week 9 (Mar 13) 11:00 am BST (End of Evaluation Stage)
Week 8 (Mar 06) Weeks:6-7 Moodle Quiz 2 15 Week 8 (Mar 06) 11:00 am BST
Week 6 (Feb 20) Weeks: 1-10 Groups project 25 Week 11 (Mar 27) 11:00 am BST

Lab structure

How to Approach Each Lab

  1. Overview
    • Brief introduction/ recap at the beginning of each lab
    • Understand what concepts and tools you’ll practice
  2. Work Through Exercises - Work on your own pace through the self-contained notes - Attempt all the tasks (Focus on understanding, not just finishing)
  3. Check Against Solutions
    • Compare your approach to the provided solutions
    • Don’t just copy - try to understand why it works
  4. Ask for Help 🙋‍♂️ - If you’re stuck more than ~5 minutes, ask for help

Week 1: Visualising and data tidying using R

Overview

We’ll revisit key concepts from your previous R programming course and build on them with more advanced methods for data manipulation and plotting.

ILO’s for today:

  • Use tools from the tidyverse and ggplot2 packages to manipulate and visualise data in R, including categorical variables.
  • Understand the concept of tidy data and apply tidyverse tools to structure datasets and join datasets accordingly.
  • Perform data wrangling tasks using tidyverse functions to prepare data for analysis and visualisation.

Load libraries & data

We shall now load into R all of the libraries we will need for this session. This can be done by typing the following into your R script:

library(ggplot2)
library(tidyverse)
library(nycflights13)
library(fivethirtyeight)
  1. The first library ggplot2 allows us to use functions within that package in order to create nicer data visualisations.
  2. The second library tidyverse is a collection of package for data manipulation.
  3. The final two libraries (nycflights13 and fivethirtyeight) contain interesting data sets that we shall examine in this session.

Tidy Data

Tidy data is about structuring your data so that:

  1. Each variable has its own column

  2. Each observation has its own row

  3. Each type of observation forms a table.

Tidy Data

Tidy data is about structuring your data so that:

  1. Each variable has its own column

  2. Each observation has its own row

  3. Each type of observation forms a table.

You will learn how to convert your data to tidy data format

Data Visualization using ggplot

Use ggplot to produce scatter plots, boxplots, histograms, barplots, line plots, etc.

1ggplot(data ,
       mapping = aes(x , y )) +
  geom_point() +
  labs(x , y , title )
1
Start by specifying the tidy data you’re plotting from

Data Visualization using ggplot

Use ggplot to produce scatter plots, boxplots, histograms, barplots,line plots, etc.

1ggplot(data ,
2       mapping = aes(x , y )) +
  geom_point() + 
  labs(x , y , 
       title ) 
1
Start by specifying the tidy data you’re plotting from
2
Use the aesthetic mapping to define which variables go on the axes, colors, size, etc

Data Visualization using ggplot

Use ggplot to produce scatter plots, boxplots, histograms, barplots,line plots, etc.

1ggplot(data ,
2       mapping = aes(x , y )) +
3  geom_point() +
  labs(x , y , 
       title ) 
1
Start by specifying the tidy data you’re plotting from
2
Use the aesthetic mapping to define which variables go on the axes, colors, size, etc
3
Add a geom layer (e.g., geom_point(), geom_bar(), geom_line())

Data Visualization using ggplot

Use ggplot to produce scatter plots, boxplots, histograms, barplots,line plots, etc.

1ggplot(data ,
2       mapping = aes(x , y )) +
3  geom_point() +
4  labs(x , y , title )
1
Start by specifying the tidy data you’re plotting from
2
Use the aesthetic mapping to define which variables go on the axes, colors, size, etc
3
Add a geom layer (e.g., geom_point(), geom_bar(), `geom_line()
4
Add labels for axes or a title to make your plot more readable.

Data wrangling

Use tools from the dplyr package (included in tidyverse) to perform data wrangling which includes transforming, mapping and summarising variables using the pipeline command %>%

Data wrangling

Select Columns

data %>% select(c(column_1,column_3,column_6))

Data wrangling

Filter observations

data %>% filter(year > 2025 & month %in% c("Jan","Feb","Mar")) 

Data wrangling

Create/modify variables

data %>% mutate(new_variable = existing_var + 1) 

Data wrangling

Summarise variables

data %>% summarise(mean_x = mean(x, na.rm=T)) 

Data wrangling

Grouping structure

data %>% summarise(mean_x = mean(x, na.rm=T), .by = group) 

Data wrangling

Joining data frames

data_1 %>%
  inner_join(data_2,
             by = join_by(key_var))

A last thing…

Note

To further enhance your skills in Data analysis, check out the additional material provided on handling date-time data.

  • 🕒 Work at your own pace or in groups — the notes and exercises are designed for flexible, self-guided learning.

  • 💬 Ask for help whenever you need it — we’re here to support you.

  • 🧠 Focus on understanding the concepts, not just completing tasks.

  • 🏠 Didn’t finish? No problem! You’re encouraged to take the exercises home and revisit them later.